An Introduction to Image Processing (Nov. 06, 2017)


We will study some specific ideas in the following general topics in image processing.


Digital image fundamentals

The term image refers to a 2D light-intensity function denoted by \(f(x,y)\), where the value or amplitude of \(f\) at spatial coordinates \((x,y)\) gives the intensity (brightness) of the image at that point.

The basic nature of \(f(x,y)\) can be characterized by two components:

  1. The amount of source light incident (illumination) on the scene:

\[ i(x,y) \text{ where } 0\leq i(x,y) \leq \infty \]

  1. The amount of light reflected (reflectance) by the objects:

\[ r(x,y) \text{ where } 0\leq r(x,y) \leq 1 \]

Total absorption \(r(x,y) = 0\) and \(r(x,y) = 1\) is never achieved.

The functions \(i(x,y)\) and \(r(x,y)\) combine as a product:

\[ f(x,y) = i(x,y)r(x,y) \text{ and hence } 0 \leq f(x,y) \leq \infty \]

In order for a computer to process an image, it has to be described as a series of numbers, each of finite precision. The digitization of \(f(x,y)\) is called:

  1. Image sampling when it refers to spatial coordinates \((x,y)\) and

  2. Quantisation when it refers to the amplitude of \(f(x,y)\)

The images are thus only sampled at a discrete number of locations with a discrete set of brightness levels.

The following is the height profile of Switzerland and sub-sampled height profile of Swiss.



Similarly, we can quantize the intensity along the red line:



to get the quantized version,



When considered together, the digitization process requires making decision about:

  1. the size of the image array \(N\times M\) and

  2. the number of discrete grey-levels \(G\) allowed for each pixel, \(f(x,y)\)

Thus,



is an image.

In digital image processing these quantities are usually powers of two, thus,

\(N = 2^n\), \(M = 2^m\) and \(G = 2^k\) for some \(n, m \text{ and } k\).

How many samples and grey-levels are required for a good approximation?

  1. Resolution (degree of discernible detail) of an image depends on the number of samples and grey-levels

  2. The bigger these parameters, the closer the digitized array approximates the original image

  3. However, the storage and processing time increases rapidly.


Relationship between the pixels

Quantisation alone does not imply a spatial structure → it must be defined. We have to consider topology and metrics as well. Neighborhood is defined via metrics and vice-versa and are defined on the grid. In 2D they are defined as \(4\)-, \(8\)- or mixed-neighborhoods. We will see them now.

But before that we will define the following:

Digital image is denoted by \(f(x,y)\), pixels as \(p,q\) and subset of pixels of \(f(x,y)\) as \(S\)

Definitions

\(4\)-Neighbours: A pixel \(p\) at spatial position \((x,y)\) has \(4\) neighbors if \(S\) is defined as:

\(S:(x+1,y),(x-1,y),(x,y+1),(x,y-1)\)

This set of pixels is called the \(4\)-neighborhood of \(p: S=N_4(p)\). Pictorially,



Diagonal Neighbours: The diagonal neighbors of \(p\) are \(N_D(p)\) is defined as the set \(S\):

\(S:(x+1,y+1),(x-1,y+1),(x+1,y-1),(x-1,y-1)\)



8-Neighbourhood: The set theoretic sum of \(N_4(p)\) and \(N_8(p)\).

\(S: N_4(p)+N_D(p)\rightarrow N_8(p)\)

Thus,



Connectivity

Connectivity between pixels is important in several areas of image processing where we need to identify regions of interest, segment etc. Important to the idea of connectivity is esatblishing boundaries around objects and extract connected components in images.

Two pixels \(p,q\) are connected if:

  1. They are neighbors, e.g. \(N_4(p)\),\(N_8(p)\),…

  2. Their grey values satisfy a specified criterion of similarity, e.g. in a binary image they have the same value of either \(0\) or \(1\)

Let V be the set of grey-level values used to define connectivity; for example in a binary image \(V=\{1\}\) or in a grey-scale image \(V=\{16,17,...,32\}\). We can define two types of connectivity:

  1. \(4\)-connectivity if two pixels p,q with values from V and q is in \(N_4(p)\)

  2. \(8\)-connectivity if two pixels p,q with values from V and q is in \(N_8(p)\)


\(4\)-connectivity paradox


\(8\)-connectivity paradox

Solution

Foreground \(8\)-neighborhood + Background \(4\)-neighborhood


Fundamental steps in image processing


Signal processing background



Basic ideas

A periodic function can be represented by the sum of sines and cosines of different frequencies, multiplied by a different coefficient (Fourier Series)

Non-periodic functions can also be represented as the integral of sines/cosines multiplied by a weighting function (Fourier Transformation)


Fourier transform and its cousins

We will now see an example of a function and its fourier transform. The function is defined by,

\[ f(x) = \begin{cases} A, & \mbox{if } 0 \leq x \leq X \\ 0, & \mbox{otherwise} \end{cases} \]



Why?



Discrete Fourier Transform (DFT) is the discrete analog of the fourier transform. Since an image is a two-dimensional quantity, we need DFT in two dimensions.



DFT is generally calculated using the Fast Fourier Transform (FFT). FFT is an algorithm to compute DFT in a fast and efficient manner. In general DFT takes about \(O(N^2)\). Proper decomposition can reduce the number of multiplications and addition proportional to \(O(N\log_2N)\). This decomposition is called the fast Fourier Transform (FFT) algorithm.

For example, let’s assume that an FFT of size \(8,192\) takes on one particular machine \(1\) second. Using the DFT method the same Fourier Transform would require \(10\) minutes \(30\) seconds.


The dynamic range of Fourier spectra usually is much higher than the typical display device can reliably reproduce. The consequence is that only the brightest parts are shown. A useful technique that compensates for this difficulty is of displaying the following function:

\[ D(u,v) = c. \log[1+\lvert F(u,v) \lvert] \] We can see the application of this technique in the following voyager image:



Fourier transform applications

First we will investigate the “basis” functions for the Fourier Transform (FT). The FT tries to represent all images as a summation of cosine-like images. Therefore images that are pure cosines have particularly simple FTs. Thus we will see FT’s application to primitive images.


FT applied to primitive images


The images are a pure horizontal cosine of 8 cycles and a pure vertical cosine of 32 cycles. Notice that the FT for each just has a single component, represented by 2 bright spots symmetrically placed about the center of the FT image. The center of the image is the origin of the frequency coordinate system. The u-axis runs left to right through the center and represents the horizontal component of frequency. The v-axis runs bottom to top through the center and represents the vertical component of frequency. In both cases there is a dot at the center that represents the (0,0) frequency term or average value of the image. Images usually have a large average value (like 128) and lots of low frequency information so FT images usually have a bright blob of components near the center. Notice that high frequencies in the vertical direction will cause bright dots away from the center in the vertical direction. And that high frequencies in the horizontal direction will cause bright dots away from the center in the horizontal direction.



Here are 2 images of more general Fourier components. They are images of 2D cosines with both horizontal and vertical components. The one on the left has 4 cycles horizontally and 16 cycles vertically. The one on the right has 32 cycles horizontally and 2 cycles vertically. (Note: You see a gray band when the function goes through gray = 128 which happens twice/cycle.) You may begin to notice there is a lot of symmetry. For all REAL (as opposed to IMAGINARY or COMPLEX) images, the FT is symmetrical about the origin so the 1st and 3rd quadrants are the same and the 2nd and 4th quadrants are the same. If the image is symmetrical about the x-axis (as the cosine images are) 4-fold symmetry results.


In general, rotation of the image results in equivalent rotation of its FT. To see that this is true, we will take the FT of a simple cosine and also the FT of a rotated version of the same function. The results can be seen by:


At first, the results seem rather surprising. The horizontal cosine has its normal, very simple FT. But the rotated cosine seems to have an FT that is much more complicated, with strong diagonal components, and also strong “plus sign” shaped horizontal and vertical components. The question is, where did these horizontal and vertical components come from? The answer is that the FT always treats an image as if it were part of a periodically replicated array of identical images extending horizontally and vertically to infinity. And there are strong edge effects between the neighbors of such a periodic array as can be seen by:


Thus, what we see as the FT in the “slant” image (lower right of the image before last) is actually the combination of the actual FT of the cosine function and that caused by the edge effects of looking at a finite part of the image. These edge effects can be significantly reduced by “windowing” the image with a function that slowly tapers off to a medium gray at the edge. The result can be seen by:


The windowed image is shown in the upper left. Its FT is shown in the lower left. The non-windowed FT is shown in the upper right and the actual, true FT of a cosine is shown in the lower right. These images are all scaled differently and the comparison is only qualitative, but it can be seen that the windowed image FT is much closer to the true FT and eliminates many of the edge effects.